-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Aggregations cancellation after collection #120944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregations cancellation after collection #120944
Conversation
|
Hi @not-napoleon, I've created a changelog YAML for you. |
|
@not-napoleon I think we should also update the |
…cancellation-after-collection' into aggregations-cancellation-after-collection
…cancellation-after-collection' into aggregations-cancellation-after-collection
|
I have performance data! These come from esbench and it's been a while since I did a deep dive into how we get all these, but it sure looks like the overhead is seems to be in the ballpark of a couple of % for this super deep agg - the worst case. I'm rerunning rally against to get the results for all aggs. I expect we'll only see a performance hit on the super deep aggs. |
|
Here's the rally track, a worst case scenario for this fix: elastic/rally-tracks#733 |
|
Here's some more timing information. This one the "after" came in faster. Which makes me think i got a machine with quieter neighbors. |
|
In that last test indexing time got faster with our change the doesn't touch indexing: It's my guess that the first run had a more noisy neighbor. It's possible that something is different in That faster-ness seems to be reflected in some runs. Non aggs: And aggs: But some aggs aren't faster: Our agg did get faster: I'm going to say "noise". This gives me the fuzzy feeling that we're ok merging this and watching for a regression in the nightlies. It'll probably take a bit to see. |
| int maxBucket, | ||
| boolean isInSortOrderExecutionRequired, | ||
| MappedFieldType... fieldTypes | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a ton of stuff to pass in. I feel like some of it, like CircuitBreakerService should be part of the implementation. Maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine actually - it lines up with how everything else is working now.
…cancellation-after-collection' into aggregations-cancellation-after-collection
…cancellation-after-collection' into aggregations-cancellation-after-collection
| for (long owningBucketOrd = 0; owningBucketOrd <= oldOrds.maxOwningBucketOrd(); owningBucketOrd++) { | ||
| long maxOwning = oldOrds.maxOwningBucketOrd(); | ||
| for (long owningBucketOrd = 0; owningBucketOrd <= maxOwning; owningBucketOrd++) { | ||
| if (context.isCancelled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It is not clear to me why we are handling auto-date histogram in a special way. Could we add a comment explaining the why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Its just that it spends a while spinning around doing extra stuff before it's ready for the normal build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I found it from an actual bug report of folks that were trying to terminate aggs and the thread was stuck here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have not one but three typos in that sentence :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It is not clear to me why we have a special case for auto date histogram, but it should be fine any way.
…cancellation-after-collection' into aggregations-cancellation-after-collection
This PR addresses issues around aggregations cancellation, mentioned in elastic#108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree. Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to. --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Nik Everett <[email protected]>
💔 Backport failed
You can use sqren/backport to manually backport by running |
This PR addresses issues around aggregations cancellation, mentioned in #108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree. Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to. --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Nik Everett <[email protected]>
This PR addresses issues around aggregations cancellation, mentioned in elastic#108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree. Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to. --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Nik Everett <[email protected]> Conflicts: server/src/main/java/org/elasticsearch/search/aggregations/bucket/BucketsAggregator.java test/framework/src/main/java/org/elasticsearch/search/aggregations/AggregatorTestCase.java
This PR addresses issues around aggregations cancellation, mentioned in #108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree. Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to. --------- Conflicts: server/src/main/java/org/elasticsearch/search/aggregations/bucket/BucketsAggregator.java test/framework/src/main/java/org/elasticsearch/search/aggregations/AggregatorTestCase.java Co-authored-by: elasticsearchmachine <[email protected]>
This PR addresses issues around aggregations cancellation, mentioned in elastic#108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree. Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to. --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Nik Everett <[email protected]> Conflicts: test/framework/src/main/java/org/elasticsearch/search/aggregations/AggregatorTestCase.java
This PR addresses issues around aggregations cancellation, mentioned in elastic#108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree. Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to. --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Nik Everett <[email protected]> Conflicts: test/framework/src/main/java/org/elasticsearch/search/aggregations/AggregatorTestCase.java
This PR addresses issues around aggregations cancellation, mentioned in #108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree. Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to. --------- Conflicts: test/framework/src/main/java/org/elasticsearch/search/aggregations/AggregatorTestCase.java Co-authored-by: elasticsearchmachine <[email protected]>
This PR addresses issues around aggregations cancellation, mentioned in elastic#108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree. Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to. --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Nik Everett <[email protected]> Conflicts: test/framework/src/main/java/org/elasticsearch/search/aggregations/AggregatorTestCase.java
This PR addresses issues around aggregations cancellation, mentioned in #108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree. Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to. --------- Conflicts: test/framework/src/main/java/org/elasticsearch/search/aggregations/AggregatorTestCase.java Co-authored-by: elasticsearchmachine <[email protected]>
This PR addresses issues around aggregations cancellation, mentioned in #108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree. Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to. --------- Conflicts: test/framework/src/main/java/org/elasticsearch/search/aggregations/AggregatorTestCase.java Co-authored-by: elasticsearchmachine <[email protected]>
This PR addresses issues around aggregations cancellation, mentioned in #108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree.
Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to.
I'm opening this PR as a draft while we work on testing for this change, to have a place for discussion and to add testing. There are three major pieces of testing we need to do:
Nik and I will be adding tests to this PR over the next few days as we work on it.
Fixed #108701